<!DOCTYPE html>
<html prefix="og: http://ogp.me/ns# article: http://ogp.me/ns/article# " lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>imputing-missing-values-through-various-strategies | 绿萝间</title>
<link href="../assets/css/all-nocdn.css" rel="stylesheet" type="text/css">
<link href="../assets/css/ipython.min.css" rel="stylesheet" type="text/css">
<link href="../assets/css/nikola_ipython.css" rel="stylesheet" type="text/css">
<meta name="theme-color" content="#5670d4">
<meta name="generator" content="Nikola (getnikola.com)">
<link rel="alternate" type="application/rss+xml" title="RSS" href="../rss.xml">
<link rel="canonical" href="https://muxuezi.github.io/posts/imputing-missing-values-through-various-strategies.html">
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
    tex2jax: {
        inlineMath: [ ['$','$'], ["\\(","\\)"] ],
        displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
        processEscapes: true
    },
    displayAlign: 'center', // Change this to 'center' to center equations.
    "HTML-CSS": {
        styles: {'.MathJax_Display': {"margin": 0}}
    }
});
</script><!--[if lt IE 9]><script src="../assets/js/html5.js"></script><![endif]--><meta name="author" content="Tao Junjie">
<link rel="prev" href="getting-sample-data-from-external-sources.html" title="getting-sample-data-from-external-sources" type="text/html">
<link rel="next" href="kernel-pca-for-nonlinear-dimensionality-reduction.html" title="kernel-pca-for-nonlinear-dimensionality-reduction" type="text/html">
<meta property="og:site_name" content="绿萝间">
<meta property="og:title" content="imputing-missing-values-through-various-strategies">
<meta property="og:url" content="https://muxuezi.github.io/posts/imputing-missing-values-through-various-strategies.html">
<meta property="og:description" content="处理缺失值¶








实践中数值计算不可或缺，好在有很多方法可用，这个主题将介绍其中一些。不过，这些方法未必能解决你的问题。
scikit-learn有一些常见的计算方法，它可以对现有数据进行变换填补NA值。但是，如果数据集中的缺失值是有意而为之的——例如，服务器响应时间超过100ms——那么更合适的方法是用其他包解决，像处理贝叶斯问题的PyMC，处理风险模型的lifelines，或者自己">
<meta property="og:type" content="article">
<meta property="article:published_time" content="2015-07-27T14:58:16+08:00">
<meta property="article:tag" content="CHS">
<meta property="article:tag" content="ipython">
<meta property="article:tag" content="Machine Learning">
<meta property="article:tag" content="Python">
<meta property="article:tag" content="scikit-learn cookbook">
</head>
<body>
<a href="#content" class="sr-only sr-only-focusable">Skip to main content</a>

<!-- Menubar -->

<nav class="navbar navbar-inverse navbar-static-top"><div class="container">
<!-- This keeps the margins nice -->
        <div class="navbar-header">
            <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-navbar" aria-controls="bs-navbar" aria-expanded="false">
            <span class="sr-only">Toggle navigation</span>
            <span class="icon-bar"></span>
            <span class="icon-bar"></span>
            <span class="icon-bar"></span>
            </button>
            <a class="navbar-brand" href="https://muxuezi.github.io/">

                <span id="blog-title">绿萝间</span>
            </a>
        </div>
<!-- /.navbar-header -->
        <div class="collapse navbar-collapse" id="bs-navbar" aria-expanded="false">
            <ul class="nav navbar-nav">
<li>
<a href="../archive.html">Archive</a>
                </li>
<li>
<a href="../categories/">Tags</a>
                </li>
<li>
<a href="../rss.xml">RSS feed</a>

                
            </li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li>
    <a href="imputing-missing-values-through-various-strategies.ipynb" id="sourcelink">Source</a>
    </li>

                
            </ul>
</div>
<!-- /.navbar-collapse -->
    </div>
<!-- /.container -->
</nav><!-- End of Menubar --><div class="container" id="content" role="main">
    <div class="body-content">
        <!--Body content-->
        <div class="row">
            
            
<article class="post-text h-entry hentry postpage" itemscope="itemscope" itemtype="http://schema.org/Article"><header><h1 class="p-name entry-title" itemprop="headline name"><a href="#" class="u-url">imputing-missing-values-through-various-strategies</a></h1>

        <div class="metadata">
            <p class="byline author vcard"><span class="byline-name fn">
                    Tao Junjie
            </span></p>
            <p class="dateline"><a href="#" rel="bookmark"><time class="published dt-published" datetime="2015-07-27T14:58:16+08:00" itemprop="datePublished" title="2015-07-27 14:58">2015-07-27 14:58</time></a></p>
            
        <p class="sourceline"><a href="imputing-missing-values-through-various-strategies.ipynb" id="sourcelink">Source</a></p>

        </div>
        

    </header><div class="e-content entry-content" itemprop="articleBody text">
    <div tabindex="-1" id="notebook" class="border-box-sizing">
    <div class="container" id="notebook-container">

<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="处理缺失值">处理缺失值<a class="anchor-link" href="imputing-missing-values-through-various-strategies.html#%E5%A4%84%E7%90%86%E7%BC%BA%E5%A4%B1%E5%80%BC">¶</a>
</h2>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>实践中数值计算不可或缺，好在有很多方法可用，这个主题将介绍其中一些。不过，这些方法未必能解决你的问题。</p>
<p>scikit-learn有一些常见的计算方法，它可以对现有数据进行变换填补<code>NA</code>值。但是，如果数据集中的缺失值是有意而为之的——例如，服务器响应时间超过100ms——那么更合适的方法是用其他包解决，像处理贝叶斯问题的PyMC，处理风险模型的lifelines，或者自己设计一套方法。</p>
<!-- TEASER_END -->
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Getting-ready">Getting ready<a class="anchor-link" href="imputing-missing-values-through-various-strategies.html#Getting-ready">¶</a>
</h3>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>处理缺失值的第一步是创建缺失值。Numpy可以很方便的实现：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [2]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">sklearn</span> <span class="k">import</span> <span class="n">datasets</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="n">iris</span> <span class="o">=</span> <span class="n">datasets</span><span class="o">.</span><span class="n">load_iris</span><span class="p">()</span>
<span class="n">iris_X</span> <span class="o">=</span> <span class="n">iris</span><span class="o">.</span><span class="n">data</span>
<span class="n">masking_array</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">binomial</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="o">.</span><span class="mi">25</span><span class="p">,</span> <span class="n">iris_X</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">bool</span><span class="p">)</span>
<span class="n">iris_X</span><span class="p">[</span><span class="n">masking_array</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">nan</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>让我们看看这几行代码，Numpy和平时用法不太一样，这里是在数组中用了一个数组作为索引。为了创建了随机的缺失值，先创建一个随机布尔值数组，其形状和<code>iris_X</code>数据集的维度相同。然后，根据布尔值数组分配缺失值。因为每次运行都是随机数据，所以<code>masking_array</code>每次都会不同。</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [3]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">masking_array</span><span class="p">[:</span><span class="mi">5</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[3]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>array([[False,  True, False, False],
       [False,  True, False, False],
       [False, False, False, False],
       [ True, False, False,  True],
       [False, False,  True, False]], dtype=bool)</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [4]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">iris_X</span><span class="p">[:</span><span class="mi">5</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[4]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>array([[ 5.1,  nan,  1.4,  0.2],
       [ 4.9,  nan,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ nan,  3.1,  1.5,  nan],
       [ 5. ,  3.6,  nan,  0.2]])</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="How-to-do-it...">How to do it...<a class="anchor-link" href="imputing-missing-values-through-various-strategies.html#How-to-do-it...">¶</a>
</h3>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>本书贯穿始终的一条原则（由于scikit-learn的存在）就是那些拟合与转换数据集的类都是可用的，可以在其他数据集中继续使用。具体演示如下所示：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [5]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">sklearn</span> <span class="k">import</span> <span class="n">preprocessing</span>
<span class="n">impute</span> <span class="o">=</span> <span class="n">preprocessing</span><span class="o">.</span><span class="n">Imputer</span><span class="p">()</span>
<span class="n">iris_X_prime</span> <span class="o">=</span> <span class="n">impute</span><span class="o">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">iris_X</span><span class="p">)</span>
<span class="n">iris_X_prime</span><span class="p">[:</span><span class="mi">5</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[5]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>array([[ 5.1       ,  3.05221239,  1.4       ,  0.2       ],
       [ 4.9       ,  3.05221239,  1.4       ,  0.2       ],
       [ 4.7       ,  3.2       ,  1.3       ,  0.2       ],
       [ 5.86306306,  3.1       ,  1.5       ,  1.21388889],
       [ 5.        ,  3.6       ,  3.82685185,  0.2       ]])</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>注意<code>[3,0]</code>位置的不同：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [6]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">iris_X_prime</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span><span class="mi">0</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[6]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>5.8630630630630645</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [8]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">iris_X</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span><span class="mi">0</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[8]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>nan</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="How-it-works...">How it works...<a class="anchor-link" href="imputing-missing-values-through-various-strategies.html#How-it-works...">¶</a>
</h3>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>上面的计算可以通过不同的方法实现。默认是均值<code>mean</code>，一共是三种：</p>
<ul>
<li>均值<code>mean</code>（默认方法）</li>
<li>中位数<code>median</code>
</li>
<li>众数<code>most_frequent</code>
</li>
</ul>
<p>scikit-learn会用指定的方法计算数据集中的每个缺失值，然后把它们填充好。</p>
<p>例如，用<code>median</code>方法重新计算<code>iris_X</code>，重新初始化<code>impute</code>即可：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [9]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">impute</span> <span class="o">=</span> <span class="n">preprocessing</span><span class="o">.</span><span class="n">Imputer</span><span class="p">(</span><span class="n">strategy</span><span class="o">=</span><span class="s1">'median'</span><span class="p">)</span>
<span class="n">iris_X_prime</span> <span class="o">=</span> <span class="n">impute</span><span class="o">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">iris_X</span><span class="p">)</span>
<span class="n">iris_X_prime</span><span class="p">[:</span><span class="mi">5</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[9]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>array([[ 5.1 ,  3.  ,  1.4 ,  0.2 ],
       [ 4.9 ,  3.  ,  1.4 ,  0.2 ],
       [ 4.7 ,  3.2 ,  1.3 ,  0.2 ],
       [ 5.8 ,  3.1 ,  1.5 ,  1.3 ],
       [ 5.  ,  3.6 ,  4.45,  0.2 ]])</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>如果数据有缺失值，后面计算过程中可能会出问题。例如，在<em>How to do it...</em>一节里面，<code>np.nan</code>作为默认缺失值，但是缺失值有很多表现形式。有时用<code>-1</code>表示。为了处理这些缺失值，可以在方法中指定那些值是缺失值。方法默认缺失值表现形式是<code>Nan</code>，就是<code>np.nan</code>的值。</p>
<p>假设我们将<code>iris_X</code>的缺失值都用<code>-1</code>表示。看着很奇怪，但是<code>iris</code>数据集的度量值不可能是负数，因此用<code>-1</code>表示缺失值完全合理：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [10]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">iris_X</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">isnan</span><span class="p">(</span><span class="n">iris_X</span><span class="p">)]</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span>
<span class="n">iris_X</span><span class="p">[:</span><span class="mi">5</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[10]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>array([[ 5.1, -1. ,  1.4,  0.2],
       [ 4.9, -1. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [-1. ,  3.1,  1.5, -1. ],
       [ 5. ,  3.6, -1. ,  0.2]])</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>填充这些缺失值也很简单：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [11]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">impute</span> <span class="o">=</span> <span class="n">preprocessing</span><span class="o">.</span><span class="n">Imputer</span><span class="p">(</span><span class="n">missing_values</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
<span class="n">iris_X_prime</span> <span class="o">=</span> <span class="n">impute</span><span class="o">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">iris_X</span><span class="p">)</span>
<span class="n">iris_X_prime</span><span class="p">[:</span><span class="mi">5</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[11]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>array([[ 5.1       ,  3.05221239,  1.4       ,  0.2       ],
       [ 4.9       ,  3.05221239,  1.4       ,  0.2       ],
       [ 4.7       ,  3.2       ,  1.3       ,  0.2       ],
       [ 5.86306306,  3.1       ,  1.5       ,  1.21388889],
       [ 5.        ,  3.6       ,  3.82685185,  0.2       ]])</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="There's-more...">There's more...<a class="anchor-link" href="imputing-missing-values-through-various-strategies.html#There's-more...">¶</a>
</h3>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>pandas库也可以处理缺失值，而且更加灵活，但是重用性较弱：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [12]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="n">iris_X</span><span class="p">[</span><span class="n">masking_array</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">nan</span>
<span class="n">iris_df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">iris_X</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="n">iris</span><span class="o">.</span><span class="n">feature_names</span><span class="p">)</span>
<span class="n">iris_df</span><span class="o">.</span><span class="n">fillna</span><span class="p">(</span><span class="n">iris_df</span><span class="o">.</span><span class="n">mean</span><span class="p">())[</span><span class="s1">'sepal length (cm)'</span><span class="p">]</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[12]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>0    5.100000
1    4.900000
2    4.700000
3    5.863063
4    5.000000
Name: sepal length (cm), dtype: float64</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>其灵活性在于，<code>fillna</code>可以填充任意统计参数值：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [13]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">iris_df</span><span class="o">.</span><span class="n">fillna</span><span class="p">(</span><span class="n">iris_df</span><span class="o">.</span><span class="n">max</span><span class="p">())[</span><span class="s1">'sepal length (cm)'</span><span class="p">]</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[13]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>0    5.1
1    4.9
2    4.7
3    7.9
4    5.0
Name: sepal length (cm), dtype: float64</pre>
</div>

</div>

</div>
</div>

</div>
    </div>
  </div>

    </div>
    <aside class="postpromonav"><nav><ul itemprop="keywords" class="tags">
<li><a class="tag p-category" href="../categories/chs.html" rel="tag">CHS</a></li>
            <li><a class="tag p-category" href="../categories/ipython.html" rel="tag">ipython</a></li>
            <li><a class="tag p-category" href="../categories/machine-learning.html" rel="tag">Machine Learning</a></li>
            <li><a class="tag p-category" href="../categories/python.html" rel="tag">Python</a></li>
            <li><a class="tag p-category" href="../categories/scikit-learn-cookbook.html" rel="tag">scikit-learn cookbook</a></li>
        </ul>
<ul class="pager hidden-print">
<li class="previous">
                <a href="getting-sample-data-from-external-sources.html" rel="prev" title="getting-sample-data-from-external-sources">Previous post</a>
            </li>
            <li class="next">
                <a href="kernel-pca-for-nonlinear-dimensionality-reduction.html" rel="next" title="kernel-pca-for-nonlinear-dimensionality-reduction">Next post</a>
            </li>
        </ul></nav></aside><script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script><script type="text/x-mathjax-config">
MathJax.Hub.Config({
    tex2jax: {
        inlineMath: [ ['$','$'], ["\\(","\\)"] ],
        displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
        processEscapes: true
    },
    displayAlign: 'center', // Change this to 'center' to center equations.
    "HTML-CSS": {
        styles: {'.MathJax_Display': {"margin": 0}}
    }
});
</script></article>
</div>
        <!--End of body content-->

        <footer id="footer">
            Contents © 2017         <a href="mailto:muxuezi@gmail.com">Tao Junjie</a> - Powered by         <a href="https://getnikola.com" rel="nofollow">Nikola</a>         
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0">
<img alt="Creative Commons License BY-NC-SA" style="border-width:0; margin-bottom:12px;" src="http://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png"></a>
            
        </footer>
</div>
</div>


            <script src="../assets/js/all-nocdn.js"></script><script>$('a.image-reference:not(.islink) img:not(.islink)').parent().colorbox({rel:"gal",maxWidth:"100%",maxHeight:"100%",scalePhotos:true});</script><!-- fancy dates --><script>
    moment.locale("en");
    fancydates(0, "YYYY-MM-DD HH:mm");
    </script><!-- end fancy dates --><script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-51330059-1', 'auto');
  ga('send', 'pageview');

</script>
</body>
</html>
